Skip to main content

API Reference

Introduction

T-System LLM Serving have an API Interface compatible with OpenAI's Chat, Completion, Vision and Embedding API. It ensures seamless integration and functionality that matches the capabilities of the original API. The tutorials bellow will show you how to use the low level OpenAI python package with our service. Although it demonstrate a compatibility of our API with OpenAI, for RAG use case especially please check our Llama-Index and LangChain intergration section.

Https API Reference

If you want to look up the POST/GET API endpoints instead, please have a look at the Https API Reference here and here.

Quickstart

Step 0: Install openai package

pip install openai

Step 1: Provide Base URL and API key.

import os
import openai
from openai import OpenAI

print("OpenAI version: ",openai.__version__)
client = OpenAI(
api_key=os.getenv('API_KEY'),
base_url="https://llm-server.llmhub.t-systems.net/v2",
)

print("==========Available models==========")
models = client.models.list()
for model in models.data:
print(model.id)
info

Above code will show all the LLM and Embedding models that available dedicated for your key.

Example output:

==========Available models==========
Llama-3-70B-Instruct
Mixtral-8x7B-Instruct-v0.1
CodeLlama-2
jina-embeddings-v2-base-de
text-embedding-bge-m3
gpt-35-turbo
text-embedding-ada-002
gpt-4-32k-canada
gpt-4-32k-france
mistral-large-32k-france
gpt-4-turbo-128k-france

You can get the specific meta data of each model:

models.data[0].meta_data # this is for Llama-3

That will output:

{'max_sequence_length': 8192, 'hidden_size': 0, 'max_output_length': 0}
info
  • max_sequence_length: maximum length of input + output sentence, measured in tokens can be processed by the LLM
  • hidden_size: number of dimension of the embedding vector. This number only available for embedding models.
  • max_output_length: maximum tokens length that LLM can output. e.g: GPT-4-Turbo have 128k max input + output, but it can only output maximum 4096 tokens.

Step 2: Chat API. The role and the content of prompt for each role is required.

model = "Llama-3-70B-Instruct" # choose one of available LLM (not the embedding model)
stream = True

chat_response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant named Llama-3."},
{"role": "user", "content": "What is Open Telekom Cloud?"},
],
temperature=0.1,
max_tokens=256,
stream=stream
)

if not stream:
print(chat_response.choices[0].message.content)
else:
for chunk in chat_response:
if chunk.choices:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)

Step 3: Completion API. With this api, the raw text will be sent directly to the LLM without special tag template.

model = "Llama-3-70B-Instruct" # choose one of available LLM (not the embedding model)
stream = True

completion = client.completions.create(
model=model,
prompt="What is python program language?",
stream=stream,
temperature=0.2,
max_tokens=128
)

if not stream:
print(completion.choices[0].text)

else:
for chunk in completion:
if chunk.choices:
if chunk.choices[0].text is not None:
print(chunk.choices[0].text, end="",flush=True)

info

The completions API only available for open-source models. In order to get the correct behavior of models, you need to follow a specific set of template that the author used during the instruction fine-tuning phase of the LLM. e.g: ## System, ## Assistant. We recommend alway make use of Chat API instead.

Step 4: Embedding API

model="jina-embeddings-v2-base-de"

texts=["I am Batman and I'm rich","I am Spiderman","I am Ironman and I'm a bilionaire", "I am Flash", "I am the president of USA"]
embeddings = client.embeddings.create(
input=texts, model=model
)

print('Embedding dimension: ',len(embeddings.data[0].embedding))
print('Number of embedding vector: ',len(embeddings.data))
print('Token usage: ',embeddings.usage)

Example output:

Embedding dimension:  1024
Number of embedding vector: 5
Usage(prompt_tokens=31, total_tokens=31)

Step 5: Function calling

tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
}
},
]
messages = []
messages.append({"role": "system", "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous."})
messages.append({"role": "user", "content": "What's the weather like today in Hamburg"})

chat_response = client.chat.completions.create(model="gpt-4-turbo-128k-france", messages=messages,tools=tools)

assistant_message = chat_response.choices[0].message

messages.append(assistant_message) #update the conversation

print(assistant_message)

Example output:

ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_adnFRLazqswLI1ky6FU2O40u', function=Function(arguments='{"location":"Hamburg","format":"celsius"}', name='get_current_weather'), type='function')])
info

Currently, we only support function calling to the proprietary models like gpt-4 series. For open-source model like Llama-3, Mixtral we will support it soon. Stay tune!

Multimodal Models

Currently, beside text generation LLM, we also provide vision models and speech-to-text models. Here is an example of how to use OpenAI Vision API for Llava-1.6-34b. Step 1: Check the available Vision models

import os
import openai
from openai import OpenAI

print("OpenAI version: ",openai.__version__)
client = OpenAI(
api_key=os.getenv('API_KEY'),
base_url="https://llm-server.llmhub.t-systems.net/vision",
)

print("==========Available models==========")
models = client.models.list()
for model in models.data:
print(model.id)

Example output:

==========Available models==========
llava-v1.6-34b
llava-v1.6-vicuna-13b

Step 2: Sent question together with the image url

model = 'llava-v1.6-vicuna-13b'
stream=True

chat_response = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": [
{"type": "text", "text": "You are an helpful AI assistant named LLava help people answer their question base on the image and text provided."}
],
},
{
"role": "user",
"content": [
{"type": "text", "text": "What’s in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
},
},
],
},
],
max_tokens=300,
stream=stream,
temperature=0.01
)

if stream:
for chunk in chat_response:
if chunk.choices:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
else:
print(chat_response.choices[0].message.content)

Example output:

The image shows a serene landscape with a wooden boardwalk or pathway leading through a field of tall grass. 
The pathway appears to be well-maintained and is surrounded by lush greenery.
The sky is partly cloudy, suggesting it might be a pleasant day.
In the distance, there are trees and what looks like a line of bushes or shrubs, which could be part of a hedge or a natural boundary.
The overall scene is peaceful and invites one to imagine a walk through the field.
  • Alternative to the online image url, you could put the local path of the image to the base64 image encoder as bellow:
import base64
import requests
from PIL import Image
from io import BytesIO

# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')

# Function to decode base64 back to image (to show on the notebook)
def load_image_from_base64(image):
return Image.open(BytesIO(base64.b64decode(image)))

# Path to your image
image_path = "/path/to/example_image.jpg"

# Getting the base64 string
base64_image = encode_image(image_path)
  • And then we could put the base64_image into the image_url:
model = 'llava-v1.6-34b'
stream=True

chat_response = client.chat.completions.create(
model=model,
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What’s in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
},
},
],
}
],
max_tokens=1000,
stream=stream,
temperature=0.01
)

if stream:
for chunk in chat_response:
if chunk.choices:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
else:
print(chat_response.choices[0].message.content)
© Deutsche Telekom AG